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(54) Apparatus and method for retrieving data from a network site 



(57) An apparatus for retrieving data from a first net- 
work site for use by a second network site utilizes a tem- 
plate that specifies the. location of the data Within a re- 
sponse solicited from the first network site. The template 
is a mark-up document having a similar format to the 
response and thus, it is not an application program. A 
marker is included within the template to determine the 
location of the data within the response. A matching 
mechanism, which may be used with any template, is 
utilized to compare the template with the response to 
determine the exact location of the data within the re- 
sponse. The data may be retrieved when its location 
within the response is ascertained. Once retrieved, the 
data may be used by the second network site for display 
in a format that is specified by the second network site. 
Accordingly, data is located within the response with a 
template and not with a scanning application program. 
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Description 

Field of the Invention 

[0001] this invention generally relates to data trans- 
mission networks and. more particularly, to retrieving 
data from a first network site via a second network site. 

Background of the Invention 

[0002] The World Wide Web is a collection of servers 
connected to the Internet, that utilize the Hypertext 
Transfer Protocol ("HTTP"). HTTP is a known applica- 
tion protocol that provides users with access to docu- 
ments {e.g., web pages) written in a standard mark-up 
page description language known as Hypertext Markup 
Language' ("HTML"). HTTP is used to transmit HTML 
web pages between a remote computer (e.g., a server) 
and a local computer in a form that is understandable to 
browser software (e.g.. Netscape Navigator™, availa- 
ble from Netscape Communications Corporation of 
Mountain View, California) executing on the local com- 
puter. 

[0003] , Among a number of basic document formatting 
functions, HTML enables software developers to specify 
graphical pointers (commonly referred to as "hyper- 
links") on displayed web pages ("base web pages") that 
point to other web pages ("remote web pages") typically 
resident on remote servers. Once the remote web page 
is displayed: a user of a local computer system may free- 
ly review its contents and perform any functions that it 
provides. One such function, for example, may be ob- 
taining specified data ("data") from the remote site. After 
the data is retrieved, it may be displayed by the local 
computer system in a selected format specified by the 
remote web page. Problems may arise, however when 
utilizing such web page function. Primarily: access to the 
data through the remote web page interface may be 
cumbersome and thus, not intuitive to the user. Accord- 
ingly, the user may not be able to retrieve the desired 
data from the remote site. Similarly, even if the data is 
retrieved from the remote site, its display in the selected 
format also may be cumbersome and thus, not in a form 
that is easily understood by the user. 
; [0004] The art has responded to these and similar 
problems by,enabling a base web site to automatically 
extract data from a remote web page, and then display 
the retrieved data in a format specified by the base web 
site. Accordingly, the base web site, and not the user, 
accesses the remote page to retrieve the data. A typical 
process that may be used for retrieving and displaying 
such data may begin when a user requests the data 
while accessing a base web page. In response, the base 
web site directs a data request to the remote site re- 
questing the data. After retrieving the request, the re- 
. mote site typically generates a response web page hav- 
ing the data. The response web page then is directed to 
the base web page for processing. 



[0005] Instead of displaying the response web page 
which, undesirably, is in a form specified by the remote 
site, the base site executes a specially designed scan- 
ning procedure that scans the response web page for 
5 the data. Once the data' is retrieved from the response 
by the scanning procedure, it may be displayed; via the 
base web page, in a format that is designed specially by 
the base web page 

[0006] As noted above, the scanning procedure is 

10 specially designed to retrieve the data from the remote 
web page. Such scanning procedure is implemented by 
writing an application program that utilizes either con- 
ventional procedural or object oriented programming 
techniques. To be effective, such program must be pre- 

is configured with the location of the data to be retrieved 
within the remote web page. Accordingly, a new scan- 
ning application program must be written each time the 
format of a response web page is modified. Developing 
such new scanning application programs are very time 

20 consuming, however thus adding to the overall cost of 
developing and maintaining the base web site. 
[0007] It therefore would be desirable to have a meth- 
od and apparatus that enables a base web page to ef- 
ficiently retrieve information from a remotely linked web 

25 site without requiring that a scanning program be devel- 
oped. 

Summary of the Invention 

30 [0008] In accordance with one aspect of the invention, 

- an apparatus for retrieving data from a first network site 
for use by a second network site utilizes a template that 
specifies the location of the data within a response so- 
licited from the first network site. The template is a mark- 

35" up document having a similar format to the response 
and thus, it is not an application program. A marker is 
included within the template to determine the location of 
the data within the response. A matching mechanism, 
which may be used with any template, is utilized to com- 
- -to ■ pare the template with the response to determine the 
: exact location of the data within the response. The data 
may be retrieved when its location within the response 
is ascertained Once retrieved: the data may be used by 
the second network site as specified by the second net- 

45 work site. Accordingly, the location of the data is ascer- 
tained within the response with a template and not with 
a scanning application program. * ( . 
[0009] In accordance with another aspect of the in- 
vention, a method of retrieving data from a first network 

50' site for use by a second network site first directs a re- 
quest for the data to the first network site. Receipt of the 
' request by the first network site generates a response 
having a predetermined location for the data. The meth- 
od then receives a template' having a marker for identi- 

55 fying the predetermined location of the data within the 
response The marker in the template then is matched 

- with the data'ih the response. Through this matching op- 
eration, a variable within the marker is assigned the 
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same value as the data. Finally, the variable is provided web site 104. A generalized flowchart showing the more 

to the second network site for any use. significant steps of a process for retrieving and display- 

[0010] In accordance withyet another aspect of the ing the requested data is shown in figure 2. 

invention, the step of, matching for the above noted [0014] The. process shown in figure 2 starts at step 

method first determines the location of the predeter- s 200 in which an HTML form (not shown) is displayed on 

mined location in both the template and the response. the display device at the local computer system 100. 

The value of the data in the predetermined location then The HTML form, which is generated and formatted by 

is assigned to the variable. the base web site 1 02 in accordance with conventional 

processes, includes at least' one field for entering infor- 

Brief Description of the Drawings «> mation ("form-input") that is required for retrieving the 

data from the remote web site 104. Once the form-input 
[0011] The foregoing and other objects and advantag- is entered into the form, the user may select a "submit" 
es of the invention will be appreciated more fully from button to transmit a message having the form-input in- 
the following further description thereof with reference formation to the base web site 102 for processing. Con- 
to the accompanying drawings wherein: . '5 tinuing with the above example, the data to be retrieved 
. may be the delivery date of a software program to the 
. Figure 1 schematically shows a commoniy used . user and the form-input information required to retrieve 
: network arrangement in which a local computer the delivery date may be the user's name. 

system may communicate with various t net work . [0015] The process then continues to step 202 in 

sites via the Internet V. . 26 which the base web site 102 first extracts the form-input 

Figure 2 is a flow chart showing the more sign if icant from the message and then generates an HTTP "POST" 

• steps of a process for retrieving. data requested by request. The POST request includes the form-input in- 
a user of a local computer system, and then display- . formation which, when received by the remote web site 

> ing the requested data on the local computer sys- 104. solicits an HTML response from the 'remote web 
tem 25 site 104* having the requested data. The form-inputs 
Figures 3A and 3B show a preferred process that thus are used by the remote web site 104 to retrieve the 
may be used by a matching mechanism for locating . requested data from a storage device such as, for ex- 
variable information within an HTML response. . . ample, a database in a non-volatile storage medium. 

[0016] In an alternative embodiment, the form-input 

Detailed Description of Illustrative Embodiments 30 information may be processed at the base web site 1 02 

into a specified form to be used by the remote web site 

[0012] Figure 1 schematically shows a. commonly .104. For example, the shipping web site may require a 

• used network arrangement in which a local computer . customer number instead of a customer name. The soft- 
system 100 may communicate -with various network ware developer site therefore must use-the form-input 
sites* via the Internet 101 . When ; utilized with a preferred 35 customer name to retrieve the customemumber from a 

, embodiment of the invention, the network-sites, include . local database, and then direct that customer number 

a base World Wide Web site ("base web sjte .16.2") for , to the shipping web site via the POST request. In a sim- 

: direct access by a user c4 the local computer system ilar manner the form-input information (or information 

1 00, a remote World Wide Web site (" remote web site ' derived from the form-input information) may be added 

1 04-) that, as discussedJn detail below, is accessed by ,40 to multiple POST requests that each are directed to dif- 

the base.web site -lQ2,:and a plurality^ of other. World ( .ferent remote web sites. It is expected that such POST 

Wide Web sites as shown by the ellipses, By way of ex- requests would solicit responses having different for- 

ample^the base web site. 1,02 may be operatecU?y a soft- mats. 

ware distributor, and the remote- web site.104 may.be , , [0017] The process then proceeds to step 204 in 

operated byra shipping company ; that ships software for -*s which the HTML response is received by the base web 

the software distributor: It should be noted that the base site 1 02 for processing. At step 206, a matching mech- 

web site . 102 and the.r emote web site 104 : may be on anism matches the HTML response against a template 

:. the same- hardware device (e g., a network'server), or to locate the requested data within the HTML response, 

on different hardware . devices that, communicate. . " The details of the matching mechanism and its interac- 
through the Internet .10 J. --...=*. . :; 50 tion with the template of step 206 are the subject of f ig- 
[0013] (^accordance with apreferred embodiment of 
the invention, in response, to a local computer system 
user's request for data, tbe.base web site 102 is con- 
structed to retrieve the requested data from the remote 
network site, and then display the retrieved data, on a 
display, device at. the local computer system 100, in a 
preselected format specified by the base web site 102. 
The user therefore does not directly access the remote 



ures 3A and 3B t both of which are discussed in detail 
below. „ " ! ' 

[0018] The template used in step 206 merely is an ed- 
ited* version of the HTML response document utilizing 
55 Me'ta 7 HTML (MHTML) markings. Meta-HTML is de- 
scribed in detail in the "MAWL 2.0 Tutorial", the copy- 
right of which is owned by Lucent Technologies and is 
available on the World Wide Web at "http://www.bell- 
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labs.com/project/MAWL/tutorial. html. " 

[001 9] MHTML also is described in detail in "Mawl 2.0 

Quick Language Reference." the copyright of which is 

owned by Lucent Technologies and is available on the 

World Wide Web at "http://www.bell-labs.com/project/ 

MAWL7quickref.html#Mhtml." 

[0020] Both references are referred to herein as 
"MHTML references." 

[0021] The MHTML markings act as markers to iden- 
tify the location of the data within the response docu- 
ment. MHTML markings used in a preferred embodi- 
ment of the invention include "MVAR." "MITER/ and 7 
MITER." When creating the template {i.e.. while editing 
the HTML code of the response documents the MHTML 
markings are inserted in the HTML code of the response 
in place of the variable parts of the response. Such in- 
sertion positions the MHTML markings relative to the 
HTML code in the response (which subsequently be- 
comes the template) so that the location of the data may 
be ascertained by the matching mechanism (discussed 
below). For example, when a template is being con- 
structed for the software distributor web site, the MVAR 
variable will be placed in the location of the shipping date 
if the shipping date part of the shipping web site re- 
sponse is the variable part of that response. According- 
ly, when using that template to determine the shipping 
date from a shipping web site response, the matching 
mechanism (discussed below) must determine the rel- 
ative locations of the MVAR variable and the HTML code 
in the template, and then apply that relationship to the 
response to locate the shipping date in the response. . 
[0022] In general terms, the MVAR marking is used to 
match text to a variable. A generalized form of an MVAR 
marking is as shown below: 

<MVAR NAME = x DELIM = string > 
[0023] As will be more apparent upon review of fig- 
ures 3A and 3B and the MHTML references, the. NAME 
attribute of the MVAR marking indicates that a character 
string at that location in the response under examination 
must be assigned to a variable named "x." The DELIM 
attribute identifies a string of individual characters that 
each may be used as a flag to identify the end of the 
character string assigned to the variable "x." Reference 
is made to the MHTML references which describe the 
function of the MVAR marking in greater detail. 
[0024] The MITER and /MITER markings (both of 
which are short for "MHTML iteration") are used in the 
template to retrieve a type of information that may be 
repeated within a response. For example, an unknown 
number of shipping dates may be used for multiple pro- 
grams ordered from the software developer by the user. 
When used in conjunction with the MVAR marking, var- 
iable information is retrieved from the response loca- 
tions corresponding to the area between the MITER and 
/MITER markings in the template. A generalized form of 
the MITER marking is as shown below: 

. <M!TER NAME = NAMELIST CURSOR = I DELIM 
, = string > 



[0025] As will be more apparent upon review of fig- 
ures 3A and 3B and the MHTML references, the NAME 
attribute of the MITER marking identifies the name of an 
array to store the repeated information. The name of the 

5 array in this example is "NAMELISf." The CURSOR at- 
tribute indicates that the character V will be used to rep- 
resent entries in the array for data between this MITER 
marking and a subsequently positioned /MITER mark- 
ing. As is known by those skilled in the art. the cursor 

io attribute enables MITER markings to be nested within 
other MITER markings. The DELIM marking identifies a 
string of characters that each may be positioned directly * f 
after the last character in the last character string of in- 
terest in the response, thus effectively marking the end 

is of the series of character strings. As is known by those 
skilled in the art, MITER markings and /MITER markings 
cooperate to retrieve data between such markings by 
utilizing MVAR markings and othernested MITER and 
/MITER markings. Reference is made to the MHTML 

20 references which describe the function of the MITER 
and /MITER markings in greater detail. 
[0026] Returning to the flow chart shown in figure 2, 
after step 206, it then is ascertained if there is a match 
between the template and the response (step 208). This 

25 js ascertained with the matching mechanism shown in 
figures 3A and 3B. A match indicates that the relative 
locations of the data and the HTML code in the response 
are the same as when the template was written. If the 
template and response are not a match., however, then 

30 either the relative locations of the data and HTML code 
in the response have been modified, or the template is 
defective. The markings in the template are ineffective 
if the relative locations of the HTML code and the data 
in the response have been modified. Accordingly the 

35 template must be rewritten to conform with the new rel- 
ative locations of data and the HTML code in the re- 
sponse. 

[0027] If there is not a match at step 208, then the 
process proceeds to step 21 0 in which an error message 

40 may be displayed on a display device at the base web 
site 102 indicating that the data may not be retrieved 
through the base web page. Conversely, if there is a 
match at step 208, then the data is displayed at the local 
computer system 100 in a format specified by the base 

45 web site 102. . - - » 

[0028] Figures 3A and 3B show a preferred process 
that may be used by the matching mechanism (steps 
206). for locating the variable information within the re- 
sponse. The process examines each character in the 

so template with a template pointer; and each character in 

the HTML response with a response pointer. The proc- ^ 
ess begins at step 300 in which the first character in the 
template is examined, it then is determined if that char- 
acter is either an MVAR marking (step 302)' : a MITER 

55 marking (step 31 4), or HTML code (step 324). The proc- 
ess thus proceeds based upon the type of character un- 
der examination. 

[0029] ' If the character is an MVAR marking, the proc- 
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ess moves to step 304 in which the template pointer is 
incremented to the DELIM marking to determine the de- 
limiting character {i.e.. the first character immediately af- 
ter, the variable under examination). At step 306. the 
character corresponding to the DELIM character is lo- 5 
cated in the response, and the response pointer is 
moved to point to such character. The response pointer 
thus skips all of the text from the character correspond- 
ing to the beginning of the MVAR marking, to the char- 
acter identified by the DELIM character. It then is deter- »o 
mined at step 308 if the NAME attribute is defined in the 
MVAR marking. As is known by those skilled in the art, 
the NAME attribute is not defined in an MVAR marking 
if it is not necessary to retrieve the variable information 
corresponding to that MVAR marking. Accordingly, if the »5 
NAME. attribute is not defined, then the process loops 
back to step 309,, thereby moving the pointers to exam- 
ine the next corresponding characters in the/esponse 
and template. 

[0030] If the NAME 'attribute is defined : then'the proc- 20 
ess moves to step .310 in which the value of the charac- 
ter or character string that was skipped in the response 
is assigned to the. variable in, the, MVAR marking. The 
value of the variable is stored in memory at step 31 2. 
The process then loops back to step 309 in which the 25 
pointers are moved to examine the next corresponding 
characters in the response and template. . : 
[0031] If at step 302 the character. was pot an MVAR 
marking, then it is determined if the character is a MITER 
. marking. As noted above, a MITER marking, indicates 30 
that a list of character strings may be retrieved from the 
response. If a, MITER marking is detected, the process 
moves to step 316 in which the characters in the re- 
sponse corresponding to the area between the MITER 
and /MITER markings in the template are matched in 35 
accordance with the matching process. Specifically, the 
characters and markings between the MITEER. and /MIT- 
ER markings are treated, as a sub^emplate, "and the 
characters in the corresponding response locations sim- 
ilarly are treated as a sub- response. Accordingly, the en- 
tire matching process shown, in figures 3 A and 3B is it- 
erated within step 316 for the. sujD-template and sub-re- 
sponse. Once the characters between, the MjTER and 
/MITER markings are matched via the iterated process 
of step 316, the process loops to step to step; 309. 
[0032] - If at step 314 itis determined that/the character 
is not a MITEB marking, tr)en the process continues to 
off-page connector "A" to step 324 (figure 3B) to deter- 
mine if the character under examination in the template 
is HTML code. If such character is not HTML code, then 
the end of the response has been reached And the var- 
iables of interest presumably have all beer) retrieved. In 
such case, the process proceeds to step 330 in which 
a "match" variable (used by step 208 in the process 
shown in figure 2) is set to "true." 
[0033] Conversely, if at step 324 it. is determined that 
the character under examination in the .template is 
HTML code, then the process continues to step 328 in 



which itis determined if the character under examination 
in the response is the same HTML code as that exam- 
ined in the template. If such character is not HTML code, 
then the template format is different than the response 
format and thus, information may not be retrieved for 
display on the local computer system 100 because the 
formats are incompatible. Accordingly, if the character 
is not HTML code, then the process continues to step 
330 in which the "matcrTvariable (used by step 208 in 
the process shown in figure 2) is set to "false." When 
the match variable is set to false, a new template must 
be made to retrieve the data. 

[0034] Conversely, if in step 328 it- is determined that 
the character under examination in the response is the 
same HTML code, then the process loops to step 309. 
via off page connector tt B M , in which the template pointer 
is moved to the next MVAR or MITER marking.in the 
template and the response pointer also is moved to such 
corresponding location. The process thus continues by 
examining the next character in the template and the 
response. 

[0035] When implemented in the base web site, the 
process shown in figure 2 preferably is an application 
program that utilizes the templates for retrieving and dis- 
playing the data. Such application program may be a 
procedure that is called by a common gateway interface 
script for retrieving and displaying the data; 
[0036] It should be apparent that a template must be 
constructed for each response from a remote web site 
that is used with the base web site 102. As noted above, 
the template is a document written in a mark-up lan- 
guage. Constructing a new template for each response 
from one or more web sites therefore is, much simpler 
and less time consuming than constructing a new appli- 
cation program for each of such responses: Accordingly, 
the cost of constructing and maintaining the base web 
site 102 incorporating the invention is significantly less 
than if the base web site 102 utilized a prior art data 
retrieval application program. 

[0037] In one embodiment of the invention, an appli- 
cation program incorporating the invention may be uti- 
lized at the local computer system 100 to directly access 
the remote site 104. Accordingly, 1 such application pro- 
; gram may include a graphical user interface for entering 
the form-input. Once the form input is entered : the pro- 
gram may execute the process shown in figure 2 to dis- 
play the data' in a format specified by the application pro- 
gram. Moreover, instead of merely displaying the data, 
the application program may utilize the data for any de- 
sired purpose. For example, the data may be processed 
to produce ' new data, added to a paper printout, or 
passed as input to another application program. 
[0038] In an alternative embodiment, the invention 
may be implemented as a computer program product 
55 for use with a computer system. Such implementation 
may include a series of computer instructions fixed ei- 
ther on a tangible medium, such as a computer readable 
media {e.g., a diskette, CD-ROM, ROM, or fixed disk) 
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or transmittable to a computer system, via a modem or 
other interface device, such as a communications 
adapter connected to a network over a medium. Medium 
may be either a tangible medium (e.g. . optical or analog 
communications lines) or a medium implemented with 
wireless techniques (e.g., microwave, infrared or other 
transmission techniques). The series of computer in- 
structions embodies all or part of the functionality previ- 
ously described herein with respect to the system. 
Those skilled in the art should appreciate that such com- 
puter instructions can be written in a number of program- 
ming languages for use with many computer architec- 
tures or operating systems. Furthermore, such instruc- 
tions may be stored in any memory device : such as sem- 
iconductor magnetic, optical or other memory devices, 
and may be transmitted using any communications 
technology, such as optical, infrared, microwave., or oth- 
er transmission technologies. It is expected that such a 
computer program product may be distributed as a re- 
movable media with accompanying printed or electronic 
documentation (e.g., shrink wrapped software), 
preloaded with a computer system (e.g.. on system 
ROM or fixed disk), or distributed from a server or elec- 
tronic bulletin board over the network (e.g., the Internet 
or World Wide Web). 

[0039] Although various exemplary embodiments of 
the invention have been disclosed, it should be apparent 
to those skilled in the art that various changes and mod- 
ifications can be made which will achieve some of the 
advantages of the invention without departing from the 
true scope of the invention. These and other obvious 
modifications are intended to be covered by the append- 
ed claims. 



Claims 

1. A method of retrieving data from a first network site 
for use by a second network site, the data having a 
value, the method comprising the steps of: 

generating a request for the data, the request 
being for transmission to the first network site, 
the request also being for generating a re- 
sponse from the first network site, the response 
having a predetermined location therein for the 
data: 

matching a marker in a template with the data 
in the response, the marker identifying the pre- 
determined location of the data in the response, 
the marker including a variable; and 
assigning the value of the data to the variable. 

2. The method as defined by claim 1 wherein the first 
network site is remote from the second network site. 

3. The method as defined by claim 1 wherein the first 
network site and second network site are on the 



same network hardware device. 

4. The method as defined by any of the preceding 
claims wherein the response is an HTML web page. 

5 

5. The method as defined by any of the preceding 
claims wherein the second network site includes a 
display device, the method further including the 
step of: 

w displaying the data on the display device. 

6. The method as defined by any of the preceding 
claims wherein the step of matching includes the 
steps of: 

15 

determining the location of the predetermined 
location in the template: and 
locating the predetermined location in the re- 
sponse. 

20 

7. The method as defined by any of the preceding 
claims wherein the template includes MHTML 
markings. 

25 8. An apparatus for retrieving data from a first network 
site for use by a second network site, the data hav- 
ing a value, the apparatus comprising: 

means for carrying out each step of a method 
so as claimed in any of the preceding claims. 

9. A computer program product for use on a computer 
system for retrieving data from a first network site 
for use by a second network site, the data having a 

35 value, the computer program product comprising a 
computer usable medium having computer reada- 
ble program code thereon, the computer readable 
program code including: 

40 program code for carrying out each step of a 

method as claimed in any of claims 1 to 7. 

10. An apparatus for enabling a second web site to re- 
trieve data from a first web site, the data having a 

•ts value, the apparatus comprising: 

a request generator to generate a request for 
the data, the request being for transmission 
from the second web site to the first web site, 

so the request also being for generating a re- 

sponse from the first web site, the response 
having a predetermined location for the data: 
a comparitor to match a marker in a template 
with the data in the response, the marker iden- 

55 tifying the predetermined location of the data in 

the response, the marker including a variable: 
and 

a variable value copier to copy the value of the 
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data to the variable. 

11. The apparatus as.defined by claim 10 further includ- 
ing: 

,5 

a display device at the second web site to dis- 
play the data. 

1 2 The apparatus as defined by claim 1 0 further includ- 
ing: ' '° 

a processor for. processing the. data at the sec- 
ond web site. 

13. The apparatus as defined by claim 10 wherein the '5 

second web site includes a processor and associ- 
* ■ ■■ ated memory : further wherein: 

the request generator, the comparitor. and the 
. * variable value copier are each implemented by 20 
the processor and associated memory. 
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